Although query-based systems (QBS) have become one of the main solutions to share data anonymously, building QBSes that robustly protect the privacy of individuals contributing to the dataset is a hard problem. Theoretical solutions relying on differential privacy guarantees are difficult to implement correctly with reasonable accuracy, while ad-hoc solutions might contain unknown vulnerabilities. Evaluating the privacy provided by QBSes must thus be done by evaluating the accuracy of a wide range of privacy attacks. However, existing attacks require time and expertise to develop, need to be manually tailored to the specific systems attacked, and are limited in scope. In this paper, we develop QuerySnout (QS), the first method to automatically discover vulnerabilities in QBSes. QS takes as input a target record and the QBS as a black box, analyzes its behavior on one or more datasets, and outputs a multiset of queries together with a rule to combine answers to them in order to reveal the sensitive attribute of the target record. QS uses evolutionary search techniques based on a novel mutation operator to find a multiset of queries susceptible to lead to an attack, and a machine learning classifier to infer the sensitive attribute from answers to the queries selected. We showcase the versatility of QS by applying it to two attack scenarios, three real-world datasets, and a variety of protection mechanisms. We show the attacks found by QS to consistently equate or outperform, sometimes by a large margin, the best attacks from the literature. We finally show how QS can be extended to QBSes that require a budget, and apply QS to a simple QBS based on the Laplace mechanism. Taken together, our results show how powerful and accurate attacks against QBSes can already be found by an automated system, allowing for highly complex QBSes to be automatically tested "at the pressing of a button".
translated by 谷歌翻译
归纳逻辑编程是基于数学逻辑的机器学习形式,该数学逻辑从给定的示例和背景知识中生成逻辑程序。在此项目中,我们扩展了Popper ILP系统以利用多任务学习。我们实施最新方法和几种新策略来提高搜索性能。此外,我们引入了约束保存,该技术可改善所有方法的整体性能。约束保存使系统可以在背景知识集的更新之间传输知识。因此,我们减少了系统执行的重复工作量。此外,约束保存使我们能够从当前的最新迭代加深搜索方法过渡到更有效的广度首次搜索方法。最后,我们尝试了课程学习技术,并显示了它们对该领域的潜在好处。
translated by 谷歌翻译
这项工作提出了一种用于赌博成瘾和抑郁症的用户级分类的变压器体系结构,可训练。与在邮政级别运行的其他方法相反,我们处理了来自特定个人的一组社交媒体帖子,以利用帖子之间的交互并消除邮政级别的标签噪声。我们利用这样一个事实,即,通过不注入位置编码,多头注意是置换不变的,并且我们在编码现代预告片编码器(Roberta / Minilm)后,从用户中随机处理了从用户中的文本集。此外,我们的体系结构可以使用现代功能归因方法来解释,并通过识别用户文本集中的区分帖子来自动创建自动数据集。我们对超参数进行消融研究,并评估我们的ERISK 2022 LAB的方法,以早期发现病理赌博的迹象和抑郁症的早期风险检测。我们团队Blue提出的方法获得了最佳的ERDE5分数为0.015,而病理赌博检测的第二好的ERDE50分数为0.009。为了早期检测到抑郁症,我们获得了0.027的第二好的ERDE50。
translated by 谷歌翻译
机器学习模型越来越多地被世界各地的企业和组织用于自动化任务和决策。在潜在敏感的数据集上培训,已经显示了机器学习模型来泄露数据集中的个人信息以及全局数据集信息。我们在这里通过提出针对ML模型的新攻击来进一步研究数据集财产推理攻击一步:数据集相关推理攻击,其中攻击者的目标是推断模型的输入变量之间的相关性。我们首先表明攻击者可以利用相关矩阵的球面参数化,以进行明智的猜测。这意味着仅使用输入变量与目标变量之间的相关性,攻击者可以推断出两个输入变量之间的相关性,而不是随机猜测基线。我们提出了第二次攻击,利用暗影建模来利用机器学习模型来改进猜测。我们的攻击采用基于高斯opula的生成建模来生成具有各种相关性的合成数据集,以便为相关推断任务培训Meta-Model。我们评估我们对逻辑回归和多层Perceptron模型的攻击,并显示出胜过模型的攻击。我们的研究结果表明,基于机器的准确性,基于机器学习的攻击随着变量的数量而减少,并达到模型攻击的准确性。然而,无论变量的数量如何,与目标变量高度相关的输入变量之间的相关性更易受攻击。我们的工作桥梁可以考虑训练数据集和个人级泄漏的全球泄漏之间的差距。当加上边缘泄漏攻击时,它也可能构成数据集重建的第一步。
translated by 谷歌翻译